Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PeekByte and Next optimization #32

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

klauspost
Copy link
Contributor

Adds call to get a single byte and add Next happy path.

Will send msgp PR with benchmarks.

Adds call to get a single byte and add Next happy path.
@philhofer philhofer merged commit 20a13a1 into philhofer:master Sep 16, 2024
@klauspost klauspost deleted the add-peekbyte branch September 16, 2024 15:00
klauspost added a commit to klauspost/msgp that referenced this pull request Sep 16, 2024
Utilize philhofer/fwd#32 to peek bytes

Eliminate bounds checks, mostly on reading sizes, but also integers.

```
benchmark                            old MB/s     new MB/s     speedup
BenchmarkReadMapHeaderBytes-32       1136.75      1136.89      1.00x
BenchmarkReadArrayHeaderBytes-32     1133.52      1149.34      1.01x
BenchmarkTestReadBytesHeader-32      1200.18      1171.21      0.98x
BenchmarkReadNilByte-32              2907.50      2176.63      0.75x
BenchmarkReadFloat64Bytes-32         4308.86      3869.64      0.90x
BenchmarkReadFloat32Bytes-32         2354.55      2332.21      0.99x
BenchmarkReadBoolBytes-32            863.66       1124.07      1.30x
BenchmarkReadTimeBytes-32            3710.27      3668.07      0.99x
BenchmarkReadMapHeader-32            228.74       391.76       1.71x
BenchmarkReadArrayHeader-32          243.90       454.31       1.86x
BenchmarkReadNil-32                  127.42       191.55       1.50x
BenchmarkReadFloat64-32              1046.08      1142.49      1.09x
BenchmarkReadFloat32-32              595.67       669.65       1.12x
BenchmarkReadInt64-32                422.29       798.35       1.89x
BenchmarkReadUintWithInt64-32        186.59       371.28       1.99x
BenchmarkReadUint64-32               282.87       377.76       1.34x
BenchmarkReadIntWithUint64-32        465.30       684.49       1.47x
BenchmarkRead16Bytes-32              1021.15      1021.38      1.00x
BenchmarkRead256Bytes-32             5324.42      6166.06      1.16x
BenchmarkRead2048Bytes-32            8405.41      8382.61      1.00x
BenchmarkRead16StringAsBytes-32      1102.72      1321.71      1.20x
BenchmarkRead256StringAsBytes-32     6959.78      6627.49      0.95x
BenchmarkRead16String-32             320.75       378.46       1.18x
BenchmarkRead256String-32            2066.40      2242.78      1.09x
BenchmarkReadComplex64-32            900.13       1064.21      1.18x
BenchmarkReadComplex128-32           1809.88      2028.11      1.12x
BenchmarkReadTime-32                 1362.46      1425.79      1.05x
```

Apologies for the noisy benchmark. `BenchmarkReadNilByte` and `BenchmarkReadFloat64Byte` does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason.

The compiler is not clever enough to track back to `l := len(b)`, so it inserts a bounds check. Also `len(b) < 3` and `big.Uint16(b[1:])` causes a bounds check. So we rewrite those to avoid it.

This should cover the lowest hanging fruits.
klauspost added a commit to tinylib/msgp that referenced this pull request Sep 18, 2024
* Improve read speed

Utilize philhofer/fwd#32 to peek bytes

Eliminate bounds checks, mostly on reading sizes, but also integers.

```
benchmark                            old MB/s     new MB/s     speedup
BenchmarkReadMapHeaderBytes-32       1136.75      1136.89      1.00x
BenchmarkReadArrayHeaderBytes-32     1133.52      1149.34      1.01x
BenchmarkTestReadBytesHeader-32      1200.18      1171.21      0.98x
BenchmarkReadNilByte-32              2907.50      2176.63      0.75x
BenchmarkReadFloat64Bytes-32         4308.86      3869.64      0.90x
BenchmarkReadFloat32Bytes-32         2354.55      2332.21      0.99x
BenchmarkReadBoolBytes-32            863.66       1124.07      1.30x
BenchmarkReadTimeBytes-32            3710.27      3668.07      0.99x
BenchmarkReadMapHeader-32            228.74       391.76       1.71x
BenchmarkReadArrayHeader-32          243.90       454.31       1.86x
BenchmarkReadNil-32                  127.42       191.55       1.50x
BenchmarkReadFloat64-32              1046.08      1142.49      1.09x
BenchmarkReadFloat32-32              595.67       669.65       1.12x
BenchmarkReadInt64-32                422.29       798.35       1.89x
BenchmarkReadUintWithInt64-32        186.59       371.28       1.99x
BenchmarkReadUint64-32               282.87       377.76       1.34x
BenchmarkReadIntWithUint64-32        465.30       684.49       1.47x
BenchmarkRead16Bytes-32              1021.15      1021.38      1.00x
BenchmarkRead256Bytes-32             5324.42      6166.06      1.16x
BenchmarkRead2048Bytes-32            8405.41      8382.61      1.00x
BenchmarkRead16StringAsBytes-32      1102.72      1321.71      1.20x
BenchmarkRead256StringAsBytes-32     6959.78      6627.49      0.95x
BenchmarkRead16String-32             320.75       378.46       1.18x
BenchmarkRead256String-32            2066.40      2242.78      1.09x
BenchmarkReadComplex64-32            900.13       1064.21      1.18x
BenchmarkReadComplex128-32           1809.88      2028.11      1.12x
BenchmarkReadTime-32                 1362.46      1425.79      1.05x
```

Apologies for the noisy benchmark. `BenchmarkReadNilByte` and `BenchmarkReadFloat64Byte` does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason.

The compiler is not clever enough to track back to `l := len(b)`, so it inserts a bounds check. Also `len(b) < 3` and `big.Uint16(b[1:])` causes a bounds check. So we rewrite those to avoid it.

This should cover the lowest hanging fruits.

* Fix up ReadExactBytes - do ReadArrayHeaderBytes as well.

* Don't stop the timer. Makes benchmarks take forever. Simplify EndlessReader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants